Goto

Collaborating Authors

 step 2



A Another universality result for neural oscillators

Neural Information Processing Systems

The universal approximation Theorem 3.1 immediately implies another universal approximation Thus y (t) solves the ODE (2.6), with initial condition y (0) = y (0) = 0 . Reconstruction of a continuous signal from its sine transform. Step 0: (Equicontinuity) We recall the following fact from topology. F (τ):= null f (τ), for τ 0, f ( τ), for τ 0. Since F is odd, the Fourier transform of F is given by We provide the details below. The next step in the proof of the fundamental Lemma 3.5 needs the following preliminary result in By (B.3), this implies that It follows from Lemma 3.4 that for any input By the sine transform reconstruction Lemma B.1, there exists It follows from Lemma 3.6, that there exists Indeed, Lemma 3.7 shows that time-delays of any given input signal can be approximated with any Step 1: By the Fundamental Lemma 3.5, there exist It follows from Lemma 3.6, that there exists an oscillator Step 3: Finally, by Lemma 3.8, there exists an oscillator network,




To Think or Not to Think: The Hidden Cost of Meta-Training with Excessive CoT Examples

Kothapalli, Vignesh, Fatahibaarzi, Ata, Firooz, Hamed, Sanjabi, Maziar

arXiv.org Artificial Intelligence

Chain-of-thought (CoT) prompting combined with few-shot in-context learning (ICL) has unlocked significant reasoning capabilities in large language models (LLMs). However, ICL with CoT examples is ineffective on novel tasks when the pre-training knowledge is insufficient. We study this problem in a controlled setting using the CoT-ICL Lab framework, and propose meta-training techniques to learn novel abstract reasoning tasks in-context. Although CoT examples facilitate reasoning, we noticed that their excessive inclusion during meta-training degrades performance when CoT supervision is limited. To mitigate such behavior, we propose CoT-Recipe, a formal approach to modulate the mix of CoT and non-CoT examples in meta-training sequences. We demonstrate that careful modulation via CoT-Recipe can increase the accuracy of transformers on novel tasks by up to 300% even when there are no CoT examples available in-context. We confirm the broader effectiveness of these techniques by applying them to pretrained LLMs (Qwen2.5 series) for symbolic reasoning tasks and observing gains of up to 130% in accuracy.




Supplementary Material

Neural Information Processing Systems

The tradeoff weight λ is not the one in (10). Flipping h to [ 1, 0] produces the same issue. The activated areas are shaded. The activated areas are shaded. So case ii is always preferred. We will use mini-batches with size b.


Towards Theoretically Understanding Why S GD Generalizes Better Than ADAM in Deep Learning (Supplementary File) Pan Zhou

Neural Information Processing Systems

Appendix D gives the proofs of the main results in Sec. 4, including Theorem 1 which analyzes the escaping time analysis of Lévy-driven SDEs and Theorem 2 But these two types of randomness actually do not depend on each other. Note that as shown in many literatures, e.g. This type of SDE is usually called "SDE with random coefficients", and usually appears in Besides, the flatness in this work is defined on general non-zero Radon measure. So it is promising to explore this invariant measure in the future. SDEs which respectively correspond to Eqn. (4) and (5): d null θ Suppose Assumptions 1 and 2 holds.